Using a Probabilistic Translation Model for Cross-Language Information Retrieval

نویسندگان

  • Jian-Yun Nie
  • Pierre Isabelle
  • George F. Foster
چکیده

Abst rac t There is an increasing need for document search mechanisms capable of matching a natural language query with documents written in a different language. Recently, we conducted several experiments aimed at comparing various methods of incorporating a cross-linguistic capability to existing information retrieval (IR) systems. Our results indicate that translating queries with off-theshelf machine translation systems can result in relatively good performance. But the results also indicate that other methods can perfonn even better. More specifically, we tested a probabilistic translation model of the kind proposed by Brown & al. [2]. The parameters of that system had been estimated automatically on a different, unrelated, corpus of parallel texts. After we augmented it with a small bilingual dictionary, this probabilistic translation model outperformed machine translation systems on our cross-language IR task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Structured Queries for Disambiguation in Cross-Language Information Retrieval

Bilingual transthr dictionaries are an important resource for query translation in cross-language text retrieval. However, term translation is not an isomorphic process, so dictionary-based systems must address the problem of ambiguity in language translation. In this paper, we claim that boolea~l conjunction (the AND operator) provides siml)le and automatic disambiguation in the target languag...

متن کامل

Transitive probabilistic CLIR models

Transitive translation could be a useful technique to enlarge the number of supported language pairs for a cross-language information retrieval (CLIR) system in a cost-effective manner. The paper describes several setups for transitive translation based on probabilistic translation models. The transitive CLIR models were evaluated on the CLEF test collection and yielded a retrieval effectivenes...

متن کامل

Structured queries, language modeling, and relevance modeling in cross-language information retrieval

Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in a...

متن کامل

Cross-lingual Information Retrieval Using Hidden Markov Models

This paper presents empirical results in cross-lingual information retrieval using English queries to access Chinese documents (TREC-5 and TREC-6) and Spanish documents (TREC-4). Since our interest is in languages where resources may be minimal, we use an integrated probabilistic model that requires only a bilingual dictionary as a resource. We explore how a combined probability model of term t...

متن کامل

1 TREC - 7 CLIR using a Probabilistic Translation Model

In this report, we describe the approach we used in TREC-7 Cross-Language IR (CLIR) track. The approach is based on a probabilistic translation model estimated from a parallel training corpus (Canadian HANSARD). The problem of translating a query from a language to another (between French and English) becomes the problem of determining the most probable words that may appear in the translation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998